How to remove duplicate rows in SQL?

Question

Aryan Kumar · Answer

To remove duplicate rows from a table in SQL, you can use the DISTINCT keyword or the GROUP BY clause. Here are two common approaches:

Using DISTINCT:

If you want to select distinct rows based on all columns, you can use the DISTINCT keyword:

SELECT DISTINCT * 
FROM your_table;

This query will return only distinct rows across all columns in the specified table.

Using GROUP BY:

If you want to remove duplicates based on specific columns, you can use the GROUP BY clause:

SELECT column1, column2, ..., columnN
FROM your_table
GROUP BY column1, column2, ..., columnN;

Replace column1, column2, ..., columnN with the columns you want to consider for uniqueness. This query will return one row for each unique combination of the specified columns.

Removing Duplicates and Keeping One Copy:

If you want to delete the duplicate rows and keep only one copy, you can use the DELETE statement with a common table expression (CTE) and the ROW_NUMBER() window function:

WITH CTE AS (
  SELECT 
    column1, column2, ..., columnN,
    ROW_NUMBER() OVER (PARTITION BY column1, column2, ..., columnN ORDER BY (SELECT NULL)) AS RowNum
  FROM your_table
)
DELETE FROM CTE WHERE RowNum > 1;

In this example, the ROW_NUMBER() function assigns a unique number to each row within its partition. The PARTITION BY clause specifies the columns for determining duplicates. The DELETE statement then removes rows with RowNum greater than 1, effectively keeping only one copy of each unique combination.

Remember to replace column1, column2, ..., columnN with the actual column names in your table. Also, be cautious when performing deletions, especially if the table contains important data. Consider taking a backup before making changes, or test the query on a smaller dataset first.

forum

How to remove duplicate rows in SQL?

Revati S Misra

Can you answer this question?

1 Answers

Using DISTINCT:

Using GROUP BY:

Removing Duplicates and Keeping One Copy:

Liked By